Hyperbolic smoothing clustering and minimum distance methods

ABSTRACT

The invention concerns four methodologies regarding the unsupervised clustering of a set of observations in multidimensional space, considering a defined number of clusters. The invention comprises a special procedure for calculating the minimum distance of a given point to a set of points in a multidimensional space, the main component of the first methodology.

TECHNICAL FIELD

The invention concerns four methodologies regarding the unsupervised clustering of a set of observations in multidimensional space, considering a defined number of dusters. The invention comprises a special procedure for calculating the minimum distance of a given point to a set of points in a Euclidian multidimensional space, the main component of the first methodology.

Clustering is an important problem that appears in a broad spectrum of applications, whose intrinsic characteristics engender many approaches to this problem, as described by R. C. Dubes and A. K. Jain in “Cluster Techniques: The User's Dilemma”, Pattern Recognition No. 8 pp. 247-260, 1976 by A. K. Jain and R. C. Dubes in “Algorithms for Clustering Data”, Prentice-Hall Inc., Upper Saddle River, N.J., 1988 and by P. Hansen and B. Jaumard, “Cluster Analysis and Mathematical Programming”, Mathematical Programming No. 79 pp. 191-215, 1997.

Clustering analysis has been used traditionally in disciplines such as: biology, biometry, psychology, psychiatry, medicine, geology, marketing and finance. Clustering is also a fundamental tool in modern technology applications, such as: pattern recognition, data mining, web mining, image processing, machine learning and knowledge discovering.

Clustering analysis can be done according to numerous criteria, through different mathematical formulations. The methodologies considered here deal with clustering problems that have a common component: the measure of a distance, which can be done following different metrics.

The first methodology, called hyperbolic smoothing, has a wider scope, and can be applied to clustering according to distances measured in different metrics, such as those known as Euclidian, Minkowski, Manhattan and Chebychev norms. Furthermore, this first methodology is the basis for the other ones.

The second methodology considers the partition of the set of observations into two groups: “data in the frontier” and “data in gravitational regions”. This method is particularly useful in combination with the first method. The resulting effect is a desirable substantial reduction of the computational effort necessary to solve the clustering problem.

The third methodology involves a specific combination of the two previous methodologies considering the case of the Euclidean norm. It achieves a faster response time when compared to the second methodology, while keeping the full robustness of the first methodology.

The fourth methodology does the same as the third methodology for the L1 norm, keeping the robustness and increasing the speed of the response.

The fifth methodology considers the calculation of the minimum distance of a given point to a set of points in a Euclidian multidimensional space in a completely differentiable way. It is the kernel component of the first methodology above, and can be applied to a broader class of problems.

General Description of the Clustering Procedure

Cluster analysis deals with problems of classification of a set of patterns or observations, in general represented as points in a multidimensional space, into clusters, following two basic and simultaneous objectives: patterns in the same cluster must be similar to each other (homogeneity objective) and also different from patterns in any other duster (separation objective), as shown by J. A. Hartingan in “Clustering Algorithms”, John Wiley and Sons, Inc., New York, N.Y., 1975 and by H. Späth in “Cluster Analysis Algorithms for Data Reduction and Classification”, Ellis Horwood, Upper Saddle River, N.J., 1980.

There are two main strategies for solving clustering problems: hierarchical clustering methods and partition clustering methods. Hierarchical methods produce a hierarchy of partitions of a set of observations. Partition methods, in general, assume a given number of clusters and, essentially, seek the optimization of an objective function measuring the homogeneity within the clusters and/or the separation between the dusters.

The present invention is related to solving unsupervised clustering problems following the general formulation presented below.

Let S={s₁, . . . , s_(n)} denote a set of m patterns or observations from a Euclidean n-space to be clustered into a given number q of disjoint clusters. To formulate the original clustering problem as a min-sun-min problem, we proceed as follows. Let x_(i), i=1, . . . , q be the centroids of the clusters, where each x_(i)εR″. The set of these centroid coordinates will be represented by XεR^(nq). Given a point s_(j) of

$\begin{matrix} {{z_{j} = {\min\limits_{x_{i} \in X}{{s_{j} - x_{i}}}}},} & (1) \end{matrix}$

where ∥s_(j)−x_(i)∥ is the distance following a given metric.

The measurement of the quality of a clustering associated to a specific position, of q centroids is provided by a function of the set of these distances.

$\begin{matrix} {{D(X)} = {\sum\limits_{j = 1}^{m}{f\left( z_{j} \right)}}} & (2) \end{matrix}$

where ƒ(z_(j)) is an arbitrary monotonic increasing function of the distance z_(j). Some trivial examples of this kind of function are: Σ_(j=1) ^(m)z_(i) or Σ_(j=1) ^(m)z_(i) ².

The optimal placing of the centroids must provide the best quality of this measurement. Therefore, if X* denotes an optimal placement, then the problem is

$\begin{matrix} {X^{*} = {\arg \; {\min\limits_{X \in R^{nq}}{D(X)}}}} & (3) \end{matrix}$

where X is the set of all placements of the q centroids. Using (1)-(3), we finally arrive at

$\begin{matrix} {X^{*} = {\underset{X \in R^{nq}}{argmin}{\sum\limits_{j = 1}^{m}{{f\left( {\min\limits_{x_{i} \in X}{{s_{j} - x_{i}}}} \right)}.}}}} & (4) \end{matrix}$

General Description of the Minimum Distance Procedure

The minimum distance of a given point to a set of points in a Euclidian multidimensional space, as considered by equation (1), is a common component in a broad class of mathematical problems.

The present invention proposes an approximate calculation of this distance in a completely differentiable way.

Previous Techniques

In the cluster analysis scope, algorithms traditionally use two main strategies: hierarchical clustering methods and partition clustering methods as described by P. Hansen and B. Jaumard in “Cluster Analysis and Mathematical Programming”, Mathematical Programming No. 79 pp. 191-215. 1997 and by A. K. Jain, M. N. Murty and P. J. Flynn in “Data Clustering: A Review”, ACM Computing Surveys, Vol. 31, No. 3, September 1999.

Hierarchical methods, essentially heuristic procedures, produce a hierarchy of partitions of the set of observations according to an agglomerative strategy or to a divisive one. In the former case, the general algorithm starts from an initial partition, in which each cluster contains one pattern, and successively merges two clusters on the basis of a similarity measure, until all patterns are in the same cluster. In the latter case, the general algorithm starts from an initial partition with all patterns in the same cluster and, by successive bipartitions, reaches a partition in which each duster contains one single pattern. In both strategies, the best partition is chosen, by a suitable criterion, from the hierarchy of partitions obtained.

Partition methods, in general, assume a given number of clusters and, essentially, seek the optimization of an objective function measuring the homogeneity within the dusters and/or the separation between the clusters. Heuristic algorithms of the exchange type, as the traditional k-means approach, are frequently used to find a local minimum of the objective function.

The k-means is one of most used tools in data mining applications, as presented by X. Wu et alli in “Top 10 algorithms in data mining”, Knowledge and Information Systems, Springer, 14, pp 1-37, 2008. It has been studied by many authors as presented by H. H. Bock in “Clustering methods: A History of k-means Algorithms”, in “Selected Contributions in Data Analysis and Classification”, editor P. Brito, G. Cucunel, P. Bertrand and F. Carvalho, Springer, pp 161-172, 2007. A simple version is presented in J. Mc Queen, “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposiums on Mathematical Statistics and Probability, pp. 281-297, 1967 or variations as presented in M. R. Anderberg, “Cluster Analysis for Applications”, Academic Press Inc., New York, 1973, and H. Späth, “Cluster Analysis Algorithms for Data Reduction and Classification”, Ellis Horwood, Upper Saddle River, N.J., 1980. For instances with a moderate number of observations (m<3000), k-means algorithms have a fast performance. However, the results are strongly dependent on the initial choice of the starting point.

Moreover, several mathematical programming techniques have been applied, with a limited success, to solving the global optimization problem:

-   dynamic programming as presented in R. E. Jensen, “A Dynamic     Programming Algorithm for Clustering Analysis”, Operations Research     No. 17 pp. 1043-1057, 1969; -   branch and bound as presented in W. L. G. Koontz, P. M. Narendra     and P. M. Fukuraga, “A Branch and Bound Clustering Algorithm”, IEEE     Transactions on Computers No. 24 pp. 908-915, 1975; -   interior point algorithms as presented in O. du Merle, P. Hansen, B.     Jaumard and V. Mladenovic, “An Interior Point Algorithm for Minimum     Sum-of-Squares Clustering”, SIAM Journal on Scientific Computing No.     21 pp. 1485-1505, 2000; -   bilinear programming as presented in O. L. Mangasarian,     “Mathematical Programming in Data Mining”, Data Mining and Knowledge     Discovery, Vol. 1, No. 2, pp 183-201, 1997; -   all kinds of metaheuristics as presented in, for example: C. R.     Reeves, “Modern Heuristics Techniques for Combinatorial Problems”,     Blackwell, London, 1993, or J. Pacheco and O. Valencia, “Design of     Hybrids for Minimum Sum-of-Squares Clustering Problem”,     Computational Statistics and Data Analysis No. 43 pp. 235-248, 2003; -   nonsmooth optimization as presented in A. M. Bagirov and J.     Yearwood, “A New Nonsmooth Optimization Algorithm for Minimum     Sum-of-Squares Clustering Problems”, European Journal of Operational     Research, 170 pp. 578-596, 2006.

DETAILED DESCRIPTION OF THE INVENTION

The detailed description is divided in five major parts: General Hyperbolic Smoothing Clustering Methodology, Boundary and Gravitational Regions Partition Methodology, Boundary and Gravitational Regions Partition Methodology Applied to the Euclidian Metric, Boundary and Gravitational Regions Partition Methodology Applied to the Manhattan Metric and the Hyperbolic Smoothing Minimum Distance procedure.

General Hyperbolic Smoothing Clustering Methodology

The core focus of the first methodology is the smoothing of a min-sun-min problem engendered by the clustering specification. In a sense, the process whereby this is achieved is an extension of a smoothing scheme, called Hyperbolic Smoothing, used in different contexts for solving nondifferentiable problems in general as presented in A. B. A. Santos, “Problemas de Programaçāo Não-Diferenciável: Uma Metodologia de Suavização”, M. Sc. thesis-COPPE-UFRJ, Rio de Janeiro, 1997 for solving the min-max problem as presented in A. M. V. Chaves, “Resolução do Problema Minimax Via Suavização”, M. Sc. Thesis-COPPE-UFRJ, Rio de Janeiro, 1997 and for the covering of plane domains by circles as presented in A. E. Xavier and A. A. F. Oliveira, “Optimal Covering of Plane Domains by Circles Via Hyperbolic Smoothing”, Journal of Global Optimization 31 (3), Kluwer, 2005. This technique was developed through an adaptation of the hyperbolic penalty method originally presented in A. E. Xavier, “Penalização Hiperbólica: Um Novo Método para Resolução de Problemas de Otimização”, M. Sc. Thesis-COPPE-UFRJ, Rio de Janeiro, 1982.

By smoothing we fundamentally mean the substitution of an intrinsically nondifferentiable two-level problem by a C^(∞) differentiable single-level alternative. This is achieved through the solution of a sequence of differentiable subproblems which gradually approaches the original problem. In the present application, each subproblem, through the use of the Implicit Function Theorem, can be transformed into a low dimension unconstrained one, which, owing to its being indefinitely differentiable, can be comfortably solved by using the most powerful and efficient algorithms, such as conjugate gradient, quasi-Newton or Newton methods.

Problem (4) above can be formulated equivalently as

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}{f\left( z_{j} \right)}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} z_{j}} = {\min\limits_{{i = 1},L,q}{{s_{j} - x_{i}}}}},{j = 1},L,{m.}}} & (5) \end{matrix}$

Considering its definition, each z_(j) must necessarily satisfy the following set of inequalities:

z _(j) −∥s _(j) −x _(i)∥₂≦0, i=1,L,q.  (6)

Substituting these inequalities for the equality constraints of problem (5), the relaxed problem becomes

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}{f\left( z_{j} \right)}}}{{{{{subject}\mspace{14mu} {to}\mspace{14mu} z_{j}} - {{s_{j} - x_{i}}}_{2}} \leq 0},{j = 1},L,m}{{i = 1},L,{q.}}} & (7) \end{matrix}$

Since the variables z_(j) are not bounded from below, the optimum solution of the relaxed problem will be z_(j)=0, j=1,Λ,m. In order to obtain the desired equivalence, we must, therefore, modify problem (7). We do so by first letting φ(y) denote max{0,y} and then observing that, from the set of inequalities in (7), it follows that

$\begin{matrix} {{{\sum\limits_{i = 1}^{q}{\phi \left( {z_{j} - {{s_{j} - x_{i}}}} \right)}} = 0},{j = 1},L,m} & (8) \end{matrix}$

For fixed j and assuming d₁<Λ<d_(q) with d_(i)=∥s_(j)−x_(i)∥₂, FIG. 1 illustrates the first three summands of (8) as a function of z_(j).

Using (8) in place of the set of inequality constraints in (7), we would obtain an equivalent problem maintaining the undesirable property that z_(j), j=1,Λ,m still has no lower bound. Considering, however, that the objective function of problem (7) will force each z_(j), j=1,Λ,m downward, we can think of bounding the latter variables from below by considering “>” in place of “=” in (8) and considering the resulting “non-canonical” problem

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}{f\left( z_{j} \right)}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} {\sum\limits_{i = 1}^{q}{\phi \left( {z_{j} - {{s_{j} - x_{i}}}} \right)}}} > 0},{j = 1},L,{m.}}} & (9) \end{matrix}$

The canonical formulation can be recovered from (9) by perturbing (8) and considering the modified problem:

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}{f\left( z_{i} \right)}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} {\sum\limits_{i = 1}^{q}{\phi \left( {z_{j} - {{s_{j} - x_{i}}}} \right)}}} \geq ɛ},{j = 1},L,m}} & (10) \end{matrix}$

for ε>0. Since the feasible set of problem (9) is the limit of (10) when ε→0⁺ we can then consider solving (9) by solving a sequence of problems like (10) with a sequence of decreasing values for ε approaching 0.

Analyzing problem (10), the definition of function φ turns it into an extremely rigid nondifferentiable structure, which makes its computational solution very hard. In view of this, the numerical method we adopt for solving problem (1) takes a smoothing approach. From this perspective, let us define the function:

$\begin{matrix} {{\varphi \left( {y,\tau} \right)} = \frac{\left( {y + \sqrt{y^{2} + \tau^{2}}} \right)}{2}} & (11) \end{matrix}$

For y,τεR and τ>0.

Function φ has the following properties:

(a) φ(y,τ)>φ(y), ∀τ>0;

${{(b)\mspace{14mu} {\lim\limits_{\tau\rightarrow 0}{\varphi \left( {y,\tau} \right)}}} = {\phi (y)}};$

(c) φ(y,τ) is an increasing convex C^(∞) function in variable y.

Therefore, function φ constitutes an approximation of function φ. Adopting the same assumptions used in FIG. 1, the first three summands of (8) and their corresponding smoothed approximations, given by (11), are depicted in FIG. 2.

By using function φ in the place of function φ, the problem turns into

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}\; {f\left( z_{j} \right)}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} {\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {{s_{j} - x_{i}}}},\tau} \right)}}} \geq ɛ},{j = 1},\Lambda,m}} & (12) \end{matrix}$

The FIG. 2 shows the original and smoothed summands in (8)

Smoothing Metrics

To obtain a differentiable problem, it is still necessary to smooth the distance ∥s_(j)−x_(i)∥. The smoothing procedures for the mostly used metrics are presented: L₂ (Euclidian), L_(p) (Minkowski), L₁ (Manhattan) and L_(∞) (Chebychev).

Euclidian Metric

For the Euclidian distance associated to the L₂ metric, let us define the function

$\begin{matrix} {{\theta_{2}\left( {s_{j},x_{i},\gamma} \right)} = \sqrt{{\sum\limits_{l = 1}^{n}\; \left( {s_{j}^{l} - x_{i}^{l}} \right)^{2}} + \gamma^{2}}} & (13) \end{matrix}$

for γ>0.

Function θ₂ has the following properties:

${{(a)\mspace{14mu} {\lim\limits_{\tau\rightarrow 0}{\theta_{2}\left( {s_{j},x_{i},\gamma} \right)}}} = {{s_{j} - x_{i}}}_{2}};$

(b) θ₂ is a C^(∞) function.

Minkowski Metric

For the case of distances associated to the L_(p) metric, let us define the function

$\begin{matrix} {{\theta_{p}\left( {s_{j},x_{i},\gamma} \right)} = \left( {{\sum\limits_{l = 1}^{n}\; \left( {s_{j}^{l} - x_{i}^{l}} \right)^{p}} + \gamma^{p}} \right)^{\frac{1}{p}}} & (14) \end{matrix}$

for γ>0:

Function θ_(p) has the following properties:

${{(a)\mspace{14mu} {\lim\limits_{\tau\rightarrow 0}{\theta_{p}\left( {s_{j},x_{i},\gamma} \right)}}} = {{s_{j} - x_{i}}}_{p}};$

(b) θ_(p) is a C^(∞) function.

Manhattan Metric

For the case of distances associated to the L₁ metric, let us define the function

$\begin{matrix} {{\theta_{1}\left( {s_{j},x_{i},\gamma} \right)} = {\sum\limits_{l = 1}^{n}\; \sqrt{\left( {s_{j}^{l} - x_{i}^{l}} \right)^{2} + \gamma^{2}}}} & (15) \end{matrix}$

for γ>0.

Function θ₁ has the following properties:

${{(a)\mspace{14mu} {\lim\limits_{\tau\rightarrow 0}{\theta_{1}\left( {s_{j},x_{i},\gamma} \right)}}} = {{s_{j} - x_{i}}}_{1}};$

(b) θ₁ is a C^(∞) function.

Chebychev Metric

For the case of distances associated to the L_(∞) metric, there is a difference in the problem formulation. As the required modification relative to the preceding metrics has a similar structure, it will not be considered as a separate case. It is sufficient to take the constraints:

${{\sum\limits_{l = 1}^{n}\; {\varphi \left( {{{{s_{j}^{l} - x_{i}^{l}}} - z_{ij}},\tau} \right)}} \leq ɛ},{i = 1},\ldots \mspace{14mu},q$ j = 1, Λ, m ${{\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - z_{ij}},\tau} \right)}} \geq ɛ},\mspace{14mu} {j = 1},\Lambda,m$

in the place of constraints of problem (12) and to define the function

θ_(∞)(s _(j) ,x _(i),γ)=√{square root over ((s _(i) ^(l) −x _(i) ^(l))²+γ²)}  (16)

for γ>0.

Function θ_(∞) has the following properties:

${{(a)\mspace{14mu} {\lim\limits_{\tau\rightarrow 0}{\theta_{\infty}\left( {s_{j},x_{i},\gamma} \right)}}} = {{s_{j} - x_{i}}}};$

(b) θ_(∞) is a C^(∞) function.

Completely Differentiable Formulation

By using the appropriate function θ as a measure of the distance ∥s_(j)−x_(i)∥ in accordance with the used metric, the completely differentiable problem:

$\begin{matrix} {{{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}\; {{f\left( z_{j} \right)}\; {\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {\theta \left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}}}}} \geq ɛ},{j = 1},\Lambda,m} & (17) \end{matrix}$

is now obtained.

So, the properties of functions φ and θ allow us to seek a solution to problem (10) by solving a sequence of sub-problems like problem (17), produced by decreasing the parameters γ→0, τ→0, ε→0.

Since z_(j)≧0, j=1,Λ,m the objective function minimization process will work reducing those values to the utmost. On the other hand, given any set of centroids x_(i), i=1,Λ,q due to property (c) of the hyperbolic smoothing function φ, the constraints of problem (17) are a monotonically increasing function in z_(j). So, these constraints will certainly be active and problem (17) will at least be equivalent to the problem:

$\begin{matrix} {{{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}\; {f\left( z_{j} \right)}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} {h_{j}\left( {z_{j},x} \right)}} = {{{\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {\theta \left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ} = 0}},{j = 1},L,m}} & (18) \end{matrix}$

The dimension of the variable domain space of problem (18) is (nq+m). As, in general, the value of the parameter m, the cardinality of the set S of the observations, is large, problem (18) has a large number of variables. However, it has a separable structure because each variable z_(j) appears only in one equality constraint. Therefore, as the partial derivative of h_(j)(z_(j),x) with respect to z_(j), j=1,Λ,m is not equal to zero, it is possible to use the Implicit Function Theorem to calculate each component z_(j), j=1,Λ,m as a function of the centroid variables x_(i), i=1,Λ,q. In this way, the unconstrained problem

$\begin{matrix} {{minimize}\mspace{14mu} {\sum\limits_{j = 1}^{m}\; {f\left( {z_{j}(x)} \right)}}} & (19) \end{matrix}$

is obtained, where each z_(j)(x) results from the calculation of a zero of each equation

$\begin{matrix} {{{h_{j}\left( {z_{j},x} \right)} = {{\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {\theta \left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ}},{j = 1},\Lambda,m} & (20) \end{matrix}$

Due to property (c) of the hyperbolic smoothing function, each term φ above is strictly increasing with variable z_(j) and therefore the equation has a single zero.

Again, due to the Implicit Function Theorem, the functions z_(j)(x) have all derivatives with respect to the variables x_(i), i=1,Λ,q and therefore it is possible to calculate the gradient of the objective function of problem (19)

$\begin{matrix} {{{\nabla{F(x)}} = {\sum\limits_{f = 1}^{m}\; {\frac{{f\left( {z_{j}(x)} \right)}}{z_{j}}{\nabla\; {z_{j}(x)}}}}}{where}} & (21) \\ {{\nabla{z_{j}(x)}} = {{- {\nabla{h_{j}\left( {z_{j},x} \right)}}}/\frac{\partial{h_{j}\left( {z_{j},x} \right)}}{\partial z_{j}}}} & (22) \end{matrix}$

while ∇h_(j)(z_(j),x) and ∂h_(j)(z_(j),x)/∂z_(j) are obtained from equations (11) and (20) and the appropriate equation between (13) and (16). In this way, it is easy to solve problem (19) by making use of any method based on first order derivative information. At last, it must be emphasized that problem (19) is defined on a (nq)-dimensional space, so it is a small problem, since the number of clusters, q, is usually very small for real-world applications. For the Chebychev metric, the preliminary calculation of ∇z_(ij)(x) . . . is necessary in order to calculate ∇z_(j)(x).

The solution of the original clustering problem can be obtained by using the general Hyperbolic Smoothing Clustering Method, described below in a simplified form:

The HSC Method Initialization Step:

Choose initial values: x⁰, γ¹, τ¹, ε¹

Choose values 0<τ₁<1, 0<ρ₂<1, 0<τ₃<1; let k=1.

Main Step: Repeat until an arbitrary stopping rule is attained

Solve problem (19) with γ=γ^(k), τ=τ^(k) and ε=ε^(k), starting at the initial point x^(k−1) and let x^(k) be the solution obtained.

Let γ^(k+1)=ρ₁γ^(k), τ^(k+1)=ρ₂τ^(k), ε^(k+1)=ρ₃ε^(k), k=k+1.

Comments:

Just as in other smoothing methods, the solution to the clustering problem is obtained, in theory, by solving an infinite sequence of optimization problems. In the HSC method, each problem that is minimized is unconstrained and of low dimension.

Notice that the general method causes τ and γ to approach 0, so the constraints of the sub-problems it solves, given as in (17), tend to those of (10). In addition, the general method causes ε to approach 0, so, in a simultaneous movement, the solved problem (10) gradually approaches problem (9).

Boundary and Gravitational Regions Partition Methodology

The calculation of the objective function of the problem (19) demands the determination of the zeros of m equations (20), one equation for each observation point. This is a relevant computational task associated to HSC Method.

In this section is presented a faster procedure for a generic clustering formulation. The basic idea is the partition of the set of observations into two non overlapping parts. By using a conceptual presentation, the first set corresponds to the observation points that are relatively close to two or more centroids. The second set corresponds to the observation points that are significantly close to a unique centroid in comparison to the other ones.

So, the first part J_(B) is the set of boundary observations and the second one is the set J_(G) of gravitational observations. Considering this partition, equation (19) can be calculated in the following way:

$\begin{matrix} {{{minimize}{\sum\limits_{j = 1}^{m}{f\left( z_{j} \right)}}} = {{\sum\limits_{j \in J_{B}}{f\left( z_{j} \right)}} + {\sum\limits_{j \in J_{G}}{f\left( z_{j} \right)}}}} & (23) \end{matrix}$

so that the objective function can be presented in the form:

minimize F(x)=F _(B)(x)+F _(G)(x)  (24)

where the two components F_(B)(x) and F_(G)(x) are completely independent.

The proposed partition scheme has a general validity. In particular, it makes sense when there are advantageous strategies for any of the two components of the objective function. Right away, a natural connection between the hyperbolic smoothing approach and this partition scheme is presented.

The first part of expression (24), associated to the boundary observations, can be calculated by using the previous presented smoothing approach, see (19) and (20):

$\begin{matrix} {{{minimize}\mspace{14mu} {F_{B}(x)}} = {\sum\limits_{j \in J_{B}}{f\left( {z_{j}(x)} \right)}}} & (25) \end{matrix}$

where each z_(j)(x) results from the calculation of a zero of each equation

$\begin{matrix} \begin{matrix} {{{h_{j}\left( {z_{j},x} \right)} = {{\sum\limits_{i = 1}^{q}{\varphi \left( {{z_{j} - {\theta \left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ}},} & {j \in J_{B}} \end{matrix} & (26) \end{matrix}$

The second part of expression (24) can be calculated by using a faster procedure, as will be shown right away.

Let us define the two parts in a more rigorous form. Let be x _(i) i=1, . . . , q be a referential position of the centroids of the clusters taken along the iterative process.

The boundary concept in relation to the referential point x can be easily specified by defining a δ band zone between neighboring centroids. For a generic point sεR^(n), let define the first and second nearest distances from s to the centroids following an arbitrary metric:

$\begin{matrix} {{d_{1}\left( {s,\overset{\_}{x}} \right)} = {{{s - {\overset{\_}{x}}_{i_{1}}}} = {\min\limits_{i}{{s - {\overset{\_}{x}}_{i}}}}}} & (27) \\ {{d_{2}\left( {s,\overset{\_}{x}} \right)} = {{{s - {\overset{\_}{x}}_{i_{2}}}} = {\min\limits_{i \neq i_{1}}{{s - {\overset{\_}{x}}_{i}}}}}} & (28) \end{matrix}$

where i₁ and i₂ are the labeling indices of these two nearest centroids.

By using the above definitions, let us define precisely the δ boundary band zone:

Z _(δ) ={sεR ^(n) |d ₂(s, x )−d ₁(s, x )<δ}  (29)

and the gravity region, this is the complementary space:

G _(δ)( x )={sεR ^(n) −Z _(δ)( x )}  (30)

FIG. 4 illustrates in a bi-dimensional space the Z_(δ)( x) and G_(δ)( x) partitions. The central lines form the Voronoy polygon associated to the referential centroids x _(i), i=1, . . . , q. The region between two lines parallel to Voronoy lines constitutes the boundary band zone Z_(δ)( x).

Now, the sets J_(B) and J_(G) can be defined in a precise form:

J _(B)( x )={j=1, . . . ,m|s _(i) εZ _(δ)( x )}  (31)

J _(G)( x )={j=1, . . . ,m|s _(j) εG _(δ)( x )}  (32)

The FIG. 3 shows the Z_(δ)( x) and G_(δ)( x) partitions

Proposition 1:

Let s be a generic point belonging to the gravity region G_(δ)( x), with nearest centroid i₁. Let x be the current position of the centroids. Let Δx=max_(i)=∥x_(i)− x _(i)∥ be the maximum displacement of the centroids.

If Δx<δ then s will continue to be nearest to the centroid x_(il) than to any other one, so

$\begin{matrix} {{\min\limits_{i \neq i_{1}}{{s - x_{i}}}} \geq {{s - x_{i_{1}}}}} & (33) \end{matrix}$

Since δ≧Δx, Proposition 1 makes possible to calculate exactly expression (23) in a faster way. First, let us define the subsets of the gravitational observations associated to each referential centroid x _(i), i=1, . . . , q:

$\begin{matrix} {{J_{i}\left( \overset{\_}{x} \right)} = \left\{ {\left. {j \in J_{G}} \middle| {\min\limits_{l = {1\mspace{11mu} \ldots \mspace{11mu} q}}{{s_{j} - {\overset{\_}{x}}_{l}}}} \right. = {{s_{j} - {\overset{\_}{x}}_{i}}}} \right\}} & (34) \end{matrix}$

The second part of expression (24), associated to the gravitational observations, can indeed be calculated by the following straightforward form that is much easier than the hard calculation of zeros of equations (20:

$\begin{matrix} {{{minimize}\mspace{14mu} {F_{2}(x)}} = {{\sum\limits_{j \in J_{G}}{f\left( {z_{j}(x)} \right)}} = {\sum\limits_{i = 1}^{q}{\sum\limits_{j \in {J_{i}{(\overset{\_}{x})}}}{f\left( {z_{j}(x)} \right)}}}}} & (35) \end{matrix}$

Therefore, if δ≧Δx is observed within the iterative process, the calculation of the expression τ_(jεJ) _(G) ƒ(z_(j)(x)) can be exactly performed by faster procedures.

By using the above results, it is possible to construct a method, the Accelerated Hyperbolic Smoothing Clustering Method, which has conceptual properties that offer a faster computational performance for solving the clustering problem given by formulation (23).

A fundamental question is the proper choice of the boundary parameter δ. Moreover, there are two main options for updating the boundary parameter δ, inside the internal minimization procedure or after it. For simplicity sake, the HSC method connected with the partition scheme presented below adopts the second option, which offers a better computational performance, in spite of an eventual violation of the δ≧Δx condition, which is corrected in the next partition update.

The AHSC Method Initialization Step:

Choose initial start point: x⁰;

Choose parameter values: γ¹, τ¹, ε¹;

Choose reduction factors: 0<ρ₁<1, 0<ρ₂<1, 0<ρ₃<1;

Specify the boundary band width: δ¹

Let k=1.

Main Step: Repeat until an arbitrary stopping rule is attained

For determining the Z_(δ)( x) and G_(δ)( x) partitions, given by (29) and (30), use x=x^(k−1) and δ=δ^(k).

Solve problem (24) starting at the initial point x^(k−1) and let x^(k) be the solution obtained:

For solving the equations (26), associated to the first part given by (25), take the smoothing parameters: γ=γ^(k), τ=τ^(k) and ε=ε^(k);

For solving the second part, given by (35), use a direct procedure considering each gravitational region separately.

Updating procedure:

Let γ^(k+1)=ρ₁γ^(k), τ^(k+1)=ρ₂τ^(k), ε^(k+1)=ρ₃ε^(k)

Redefine the boundary value: δ^(k+1)

Let k:=k+1.

Comments:

The above method does not include any procedure for considering the occurrence of empty gravitational regions. This possibility can be overcome by simply moving the centroids.

The efficiency of the AHSC (HSC Method Connected with the Boundary and Gravitational Regions Partition Scheme) depends strongly on the parameter δ. A choice of a small value for it will imply an improper definition of the set G_(δ)( x) and frequent violation of the basic condition Δx<δ, for the validity of Proposition 1. Otherwise, a choice of a large value will imply a decrease in the number of gravitational observation points and, therefore, the computational advantages given by formulation (35) will be reduced.

As a general strategy, within first iterations, larger δ values must be used, because the centroid displacements are more expressive. The δ values must be gradually decreased in the same proportion of the decrease of these displacements.

Boundary and Gravitational Regions Partition Methodology Applied to the Euclidian Metric

In this section, a particular clustering problem formulation is considered. Among many criteria used in cluster analysis, the most natural, intuitive and frequently adopted criterion is the minimum sum-of-squares clustering (MSSC). This criterion corresponds to the minimization of the sum-of-squares of distances of observations to their cluster means, or equivalently, to the minimization of within-group sum-of-squares. It is a criterion for both the homogeneity and the separation objectives since, according to the Huygens Theorem, minimizing the within-cluster inertia of a partition (homogeneity within the cluster) is equivalent to maximizing the between-cluster inertia (separation between clusters).

The minimum sum-of-squares clustering (MSSC) formulation produces a mathematical problem of global optimization. It is both a nondifferentiable and a nonconvex mathematical problem, with a large number of local minimizers. It is a particular case of the general formulation (5)

$\begin{matrix} {{{minimize}{\sum\limits_{j = 1}^{m}z_{j}^{2}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} z_{j}} = {\min\limits_{{i = 1},L,q}{{s_{j} - x_{i}}}_{2}}},{j = 1},L,m}} & (36) \end{matrix}$

By performing the partition of the set of observations above described into two non overlapping parts, the boundary band zone and the gravitational regions, the problem becomes:

$\begin{matrix} {{{minimize}{\sum\limits_{j = 1}^{m}{z_{j}(x)}^{2}}} = {{\sum\limits_{j \in J_{B}}{z_{j}(x)}^{2}} + {\sum\limits_{j \in J_{G}}{z_{j}(x)}^{2}}}} & (37) \end{matrix}$

so that the objective function can be presented in the form:

minimize F(x)=F _(B)(x)+F _(G)(x)  (38)

where the two components are completely independent.

The first part of expression (38), associated to the boundary observations, can be calculated by using the previous presented smoothing approach, see (19) and (20):

$\begin{matrix} {{{minimize}\mspace{14mu} {F_{B}(x)}} = {\sum\limits_{j \in J_{B}}{z_{j}(x)}^{2}}} & (39) \end{matrix}$

where each z_(j)(x_(i)) results from the calculation of a zero of each equation

$\begin{matrix} {{{{subject}\mspace{14mu} {to}\mspace{14mu} {h_{j}\left( {z_{j},x} \right)}} = {{{\sum\limits_{i = 1}^{q}{\varphi \left( {{z_{j} - {\theta_{2}\left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ} = 0}},{j \in J_{B}}} & (40) \end{matrix}$

The second part of expression (38) can be calculated by using a faster procedure, as will be shown right away.

The center of the observations of each non-empty subset is given by

$\begin{matrix} {{v_{i} = {\frac{1}{J_{i}}{\sum\limits_{s_{j} \in J_{i}}s_{j}}}},{{\forall i} = 1},\ldots \mspace{14mu},q} & (41) \end{matrix}$

Let us consider the second sum of expression (37). It will be calculated taking into reference the centers above defined

$\begin{matrix} {{{minimize}\mspace{14mu} {F_{G}(x)}} = {{\sum\limits_{j \in J_{G}}{z_{j}(x)}^{2}} = {{\sum\limits_{i = 1}^{q}{\sum\limits_{j \in J_{i}}{{s_{j} - x_{i}}}^{2}}} =}}} & (42) \\ {{\sum\limits_{i = 1}^{q}{\sum\limits_{j \in J_{i}}{{s_{j} - v_{i} + v_{i} - x_{i}}}^{2}}} =} & (43) \\ {\sum\limits_{i = 1}^{q}\left\lbrack {{\sum\limits_{j \in J_{i}}{{s_{j} - v_{i}}}^{2}} + {2\left( {v_{i} - x_{i}} \right){\sum\limits_{j \in J_{i}}\left( {s_{j} - v_{i}} \right)}} + {\sum\limits_{j \in J_{i}}{{x_{i} - v_{i}}}^{2}}} \right\rbrack} & (44) \end{matrix}$

Equation (41) implies that

$\begin{matrix} {{{\sum\limits_{j \in J_{i}}\left( {s_{j} - v_{i}} \right)} = 0},{{and}\mspace{14mu} {then}\text{:}}} & (45) \\ {{{minimize}\mspace{14mu} {F_{G}(x)}} = {{\sum\limits_{i = 1}^{q}{\sum\limits_{j \in J_{i}}{{s_{j} - v_{i}}}^{2}}} + {\sum\limits_{i = 1}^{q}{{J_{i}}{{x_{i} - v_{i}}}^{2}}}}} & (46) \end{matrix}$

When the position of centroids x_(i), i=1, . . . , q move within the iterative process, the value of the first sum of (46) assumes a constant value, since the vectors s and v are fixed. On the other hand, for the calculation of the second sum, it is only necessary to calculate q distances, ∥v_(i)−x_(i)∥, i=1, . . . , q.

The gradient of the second part objective function is easily calculated by:

$\begin{matrix} {{\nabla{F_{G}(x)}} = {\sum\limits_{i = 1}^{q}{2{J_{i}}\left( {x_{i} - v_{i}} \right)}}} & (47) \end{matrix}$

where the vector (x_(i)−v_(i)) must be taken in R^(nq), so it has the first (i−1)q components and the last 1=iq+1, . . . , nq components equal zero.

Therefore, if δ≧Δx is observed within the iterative process, the calculation of the expression Σ_(jεJ) _(G) z_(j)(x)² and its gradient can be exactly performed by very fast procedures, equations (46) and (47).

By using the above results, it is possible to construct a specific method, the Accelerated Hyperbolic Smoothing Clustering Method Applied to the Euclidian Metric, which has conceptual properties that offer a faster computational performance for solving the clustering problem given by formulation (37), since the calculation of the second sum is very simple.

A fundamental question is the proper choice of the boundary parameter δ. Moreover, there are two main options for updating the boundary parameter δ, inside the internal minimization procedure or after it. For simplicity sake, the HSC method connected with the partition scheme presented below adopts the second option, which offers a better computational performance, in spite of an eventual violation of the δ≧Δx condition, which is corrected in the next partition update.

The AHSC-L2 Method Initialization Step:

Choose initial start point: x⁰;

Choose parameter values: γ¹, τ¹, ε¹;

Choose reduction factors: 0<ρ₁<1, 0<ρ₂<1, 0<ρ₃<1;

Specify the boundary band width: δ¹

Let k=1.

Main Step: Repeat until an arbitrary stopping rule is attained

For determining the Z_(δ)( x) and G_(δ)( x) partition, given by (29) and (30), use x=x^(k−1) and δ=δ^(k).

Calculate the centers v_(i), i=1, . . . , q of gravitational regions by using (41).

Solve problem (38) starting at the initial point x^(k−1) and let x^(k) be the solution obtained:

For solving the equations (40), associated to the first part given by (39), take the smoothing parameters: γ=γ^(k), τ=τ^(k) and ε=ε^(k);

For solving the second part, given by (46), use the above calculated centers of the gravitational regions.

Updating procedure:

Let γ^(k+1)=ρ₁γ^(k), τ^(k+1)=ρ₂τ^(k), ε^(k+1)=ρ₃ε^(k);

Redefine the boundary value: δ^(k+1);

Let k:=k+1.

Comments:

The above method does not include any procedure for considering the occurrence of empty gravitational regions. This possibility can be overcome by simply moving the centroids.

The efficiency of the AHSC-L2 (HSC Method Connected with the Boundary and Gravitational Regions Partition Scheme Applied to the Euclidian Metric) depends strongly on the parameter δ. A choice of a small value for it will imply an improper definition of the set G_(δ)( x) and frequent violation of the basic condition Δx<δ, for the validity of Proposition 1. Otherwise, a choice of a large value will imply a decrease in the number of gravitational observation points and, therefore, the computational advantages given by formulation (46) will be reduced.

As a general strategy, within first iterations, larger δ values must be used, because the centroid displacements are more expressive. The δ values must be gradually decreased in the same proportion of the decrease of these displacements.

Computational Results

The computational results presented below were obtained from a preliminary implementation of the Accelerated Hyperbolic Smoothing Clustering Applied to the Euclidian Metric Methodology by the AHSC-L2 method. The numerical experiments have been carried out on a PC Intel Celeron with 2.7 GHz CPU and 512 MB RAM. The programs are coded with Compac Visual FORTRAN, Version 6.1. The unconstrained minimization tasks were carried out by means of a Quasi-Newton algorithm employing the BFGS updating formula from the Harwell Library, obtained in the site: (http://www.cse.scitech.ac.uk/nag/hsl/).

In order to show the distinct performance of the AHSC-L2 Method, results obtained by solving a set of the largest problems of the TSP collection are shown below (http://www.iwr.uni-heidelberg.de/groups/comopt/software).

Table 1 presents the results for the TSPLIB-3038 data set. It exhibits the results produced by AHSC-L2 Method and, for comparison, those of two algorithms presented by Bagirov in “Modified Global k-means Algorithm for Minimum Sum-of-Squares Clustering Problems”, Pattern Recognition, Vol 41 Issue 10 pp 3192-3199, 2008. The first two columns show the number of clusters (q) and the best known value for the global optimum (ƒ_(opt)) taken from Bagirov (2008). The next columns show the error (E) for the best solution produced (ƒ_(Best)) and the mean CPU time (Time) given in seconds associated to three algorithms: multi-start k-means (MS k-means), modified global k-means (MGKM) and the proposed AHSM-L2. The errors are calculated in the following way: E=100(ƒ_(Best)−ƒ_(opt))/ƒ_(opt).

The multi-start k-means algorithm is the traditional k-means algorithm with multiple initial starting points. In this experiment, to find q clusters, 100 times q starting points were randomly chosen in the MS k-means algorithm. The global k-means (GKM) algorithm, introduced by A. Likas, M. Vlassis and J. Verbeek in “The Global k-means Clustering Algorithm”, Pattern Recognition, Vol 36 pp 451-461, 2003, is a significant improvement of the k-means algorithm. The MGKS is an improved version of the GKM algorithm proposed by Bagirov (2008). The AHSM-L2 solutions were produced by using starting points in all cases, except q=40 and q=50, where 20 and 40 starting points were taken, respectively.

TABLE 1 Results for the TSPLIB-3038 Instance MS k-means MGKM AHSC-L2 q f_(opt) E Time E Time E Time 2 0.31688E10 0.00 12.97 0.00 0.86 0.05 0.07 10 0.56025E09 0.00 11.52 0.58 3.30 0.01 0.28 20 0.26681E09 0.42 14.53 0.48 5.77 0.05 0.59 30 0.17557E09 1.16 19.09 0.67 8.25 0.31 0.86 40 0.12548E09 2.24 22.28 1.35 10.70 −0.11 1.09 50 0.98400E08 2.60 23.55 1.41 13.23 0.44 1.36 60 0.82006E08 5.56 27.64 0.98 15.75 −0.80 1.91 80 0.61217E08 4.84 30.02 0.63 20.94 −0.73 6.72 100 0.48912E08 5.99 33.59 0.00 26.11 −0.60 9.79

It is possible to observe in each row of Table 1 that the best solution produced by the new AHSC-L2 Method becomes significantly smaller than that by the MS k-means when the number of clusters q increases. In fact, this algorithm does not perform well for big instances, despite being one of most used ones. X. Wu et alli in “Top 10 algorithms in data mining”, Knowledge and Information Systems, Springer, 14, pp 1-37, 2008 present the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining in December 2006. The k-means assumes the second place in this list. The comparison between XHSC and MGKM solutions demonstrates a similar superiority of the proposed algorithm. In the same way, the comparison of the time columns shows a consistent speed advantage of the proposed algorithm over the older ones.

On the other hand, the best solution produced by the AHSC-L2 Method is very close to the putative global minimum, the best known solution of the TSPLIB-3038 instance. Moreover, in this preliminary experiment, by using a relatively small number of initial starting points, four new putative global minimum results (q=40, q=60, q=80 and q=100) have been established.

TABLE 2 Results for the Pla85900 Instance HSC AHSC-L2 q f_(Calculated) Occur. E_(Mean) Time_(Mean) Occur. E_(Mean) Time_(Mean) 2 0.37491E16 4 0.86 23.07 5 0.58 3.65 3 0.22806E16 10 0.00 47.41 7 0.04 4.92 4 0.15931E16 10 0.00 76.34 10 0.00 5.76 5 0.13397E16 1 0.80 124.32 1 1.35 7.78 6 0.11366E16 8 0.12 173.44 2 1.25 7.87 7 0.97110E15 4 0.42 254.37 1 0.87 9.33 8 0.83774E15 8 0.55 353.61 4 0.37 12.96 9 0.74660E15 3 0.68 438.71 1 0.25 13.00 10 0.68294E15 4 0.29 551.98 3 0.46 14.75

Table 2 presents the results for the Pla85900 data set. Ten different randomly chosen starting points were used. The first column presents the specified number of clusters (q). The second column presents the best objective function value (ƒ_(Calculated)) produced by the HSC Method and by the AHSC-L2 Method, both alternatives obtaining the same results within a 5 decimal digits precision. The following three columns give the particular data associated to the original HSC Method: the number of occurrences of the best solution, the average error of the 10 solutions (E_(Mean)) in relation to the best solution obtained and CPU mean time given in seconds. The last three columns give the same data associated to the new AHSC-L2 Algorithm.

The results presented in Table 2 show a coherent performance of both algorithms. It was impossible to find any record of solutions of this instance. Indeed, the clustering literature seldom considers instances with such high number of observations. The high number of occurrences of the best solution (Occur.) and the low values presented in columns (E_(Mean)) show a consistent performance of both algorithms. The principal issue, the comparison between the mean CPU time values, shows clearly the extra performance of the new proposed AHSC-L2 Method resulting from the very fast procedures associated to equations (46) and (47).

Table 3 presents the computational results produced by the AHSC-L2 Method for the largest instances of Symmetric Traveling Salesman Problem (TSPLIB) collection: FL3795, FNL4461, RL5915, RL5934, Pla7397, RL11849, USA13509, BRD14051, D15112, BRD18512 and Pla33810. For each instance, two cases are presented: q=5 and q=10. Ten different randomly chosen starting points were used. For each case, the table presents: the best objective function value produced by the AHSC-L2 Method (ƒ_(AHSC-L2) _(Best) ), the average error of the 10 solutions in relation to the best solution obtained (E_(Mean)) and CPU mean time given in seconds (Time_(Mean)).

TABLE 3 Results for larger instances of the TSPLIB collection q = 5 q = 10 Instance f_(AHSC-L2) _(Best) E_(Mean) Time_(Mean) f_(AHSC-L2) _(Best) E_(Mean) Time_(Mean) FL3795 0.368283E09 6.18 0.18 0.106394E09 2.30 0.26 FNL4461 0.181667E10 0.43 0.31 0.853304E09 0.36 0.52 RL5915 0.379585E11 1.01 0.45 0.187794E11 0.41 0.74 RL5934 0.393650E11 1.69 0.39 0.191761E11 2.35 0.76 Pla7397 0.506247E14 1.94 0.34 0.243486E14 2.10 0.80 RL11849 0.809552E11 1.11 0.83 0.369192E11 0.53 1.55 USA13509 0.329511E14 0.01 1.01 0.149816E14 1.39 1.69 BRD14051 0.122288E11 1.20 0.82 0.593928E10 1.17 2.00 D15112 0.132707E12 0.00 0.88 0.644901E11 0.71 2.27 BRD18512 0.233416E11 1.30 1.25 0.105912E11 1.05 2.24 Pla33810 0.335680E15 0.22 3.54 0.164824E15 0.68 5.14

It was impossible to perform a comparison, given the lack of records of solutions of these instances. Indeed, the clustering literature seldom considers instances with such high number of observations. Only a possible remark: the low values presented in columns (E_(mean)) show a consistent performance of the proposed alg

Boundary and Gravitational Regions Partition Methodology Applied to the Manhattan Met

In this section, a faster procedure for the minimum sum of distances formulation is presented, where the L1 metric appears in the distance specification. It is a particular case of the general formulation (5):

$\begin{matrix} {{{minimize}{\mspace{11mu} \;}{\sum\limits_{j = 1}^{m}\; z_{j}}}{{{{subject}\mspace{14mu} {to}\mspace{14mu} z_{j}} = {\min\limits_{{i = 1},L,q}{{s_{j} - x_{i}}}_{1}}},{j = 1},L,m}} & (49) \end{matrix}$

In the same way, the basic idea of the partition of the set of observations into two non overlapping parts will be used, since the results of Proposition 1 are valid for any metric. Considering the same partition scheme, the objective function of problem (49) can be expressed in the separated form:

$\begin{matrix} {{{minimize}{\mspace{11mu} \;}{\sum\limits_{j = 1}^{m}\; {z_{j}(x)}}} = {{\sum\limits_{j \in J_{B}}\; {z_{j}(x)}} + {\sum\limits_{j \in J_{a}}\; {z_{j}(x)}}}} & (50) \end{matrix}$

so that the objective function can be presented in the form:

minimize F(x)=F _(B)(x)+F _(G)(x)  (51)

where the two components are completely independent.

The first part of expression (50), associated to the boundary observations, can be calculated by using the previously presented smoothing approach, as in (19) and (20):

$\begin{matrix} {{{{minimize}\mspace{14mu} {F_{B}(x)}} = {\sum\limits_{j \in J_{B}}\; {z_{j}(x)}}},} & (52) \end{matrix}$

where each z_(j)(x) results from the calculation of a zero of each equation

$\begin{matrix} {{{h_{j}\left( {z_{j},x} \right)} = {{{\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {\theta_{1}\left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ} = 0}},{j \in J_{B}}} & (53) \end{matrix}$

The second part of expression (51) can be calculated by using a faster procedure, as will be shown right away. Let be again x _(i), i=1, . . . , q be a referential position of centroids of the clusters taken in the iterative process used for performing the partition.

First, let us define the subsets of gravitational observations associated to each referential centroid:

$\begin{matrix} {{J_{i}\left( \overset{\_}{x} \right)} = \left\{ {\left. {j \in J_{G}} \middle| {\min\limits_{l = {1\mspace{14mu} \ldots \mspace{14mu} q}}{{s_{j} - {\overset{\_}{x}}_{l}}}} \right. = {{s_{j} - {\overset{\_}{x}}_{i}}}} \right\}} & (54) \end{matrix}$

Let us consider the second sum of expression (50).

$\begin{matrix} \begin{matrix} {{{minimize}\mspace{14mu} {F_{G}(x)}} = {\sum\limits_{j \in J_{G}}\; {z_{j}(x)}}} \\ {= {\sum\limits_{i = 1}^{q}\; {\sum\limits_{j \in J_{i}}\; {{s_{j} - x_{i}}}_{1}}}} \\ {= {\sum\limits_{i = 1}^{q}\; {\sum\limits_{j = J_{i}}\; {\sum\limits_{l = 1}^{n}\; {{{s_{j}^{l} - x_{i}^{l}}}(56)}}}}} \\ {= {\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; {\sum\limits_{j = J_{i}}\; {{{s_{j}^{l} - x_{i}^{l}}}(57)}}}}} \end{matrix} & (55) \end{matrix}$

Let us now perform the partition of each set J_(i) into 3 subsets or each component in the following form:

J _(il) ⁺( x )={jεJ _(i)( x )|s _(j) ^(l) − x _(i) ^(l)≧δ}  (58)

J _(il) ⁻( x )={jεJ _(i)( x )|s _(j) ^(l) − x _(i) ^(l)≦−ε}  (59)

J _(il) ⁰( x )={jεJ _(i)( x )|−δ<s _(j) ^(l) − x _(i) ^(l)<+δ}  (60)

By using the defined subsets, it is obtained:

${{minimize}\mspace{14mu} {F_{G}(x)}} = {{\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; \left\lbrack {{\sum\limits_{j \in J_{il}^{+}}\; {{s_{j}^{l} - x_{i}^{l}}}} + {\sum\limits_{j \in J_{il}^{-}}\; {{s_{j}^{l} - x_{i}^{l}}}} + {\sum\limits_{j \in J_{il}^{0}}\; {{s_{j}^{l} - x_{i}^{l}}}}} \right\rbrack}} = {\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; \begin{bmatrix} {{\sum\limits_{j \in J_{il}^{+}}\; {{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l} + {\overset{\_}{x}}_{i}^{l} - x_{i}^{l}}}} +} \\ {{\sum\limits_{j \in J_{il}^{-}}\; {{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l} + {\overset{\_}{x}}_{i}^{l} - x_{i}^{l}}}} + {\sum\limits_{j \in J_{il}^{0}}\; {{s_{j}^{l} - x_{i}^{l}}}}} \end{bmatrix}}}}$

Let us define the component displacement of centroid Δx_(i) ^(l)=x_(i) ^(l)− x _(i) ^(l). Since |Δx_(i) ^(l)|<δ, from the above definitions of the subsets, it follows that:

|s _(j) ^(l) −x _(i) ^(l) |=|s _(j) ^(l) − x _(i) ^(l) ∥−Δx _(i) ^(l) for jΔJ _(il) ⁺

|s _(j) ^(l) −x _(i) ^(l) |=|s _(j) ^(l) − x _(i) ^(l) ∥−Δx _(i) ^(l) for jΔJ _(il) ⁻  (61)

So, it follows:

$\begin{matrix} {{{minimize}\mspace{14mu} {F_{G}(x)}} = {{\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; \begin{bmatrix} {{\sum\limits_{j \in J_{il}^{+}}\; \left( {{{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l}}} - {\Delta \; x_{i}^{l}}} \right)} +} \\ {{\sum\limits_{j \in J_{il}^{-}}\; \left( {{{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l}}} - {\Delta \; x_{i}^{l}}} \right)} + {\sum\limits_{j \in J_{il}^{0}}\; {{s_{j}^{l} - x_{i}^{l}}}}} \end{bmatrix}}} = {\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; \begin{bmatrix} {{\sum\limits_{j \in J_{il}^{+}}\; {{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l}}}} - {{J_{il}^{+}}\Delta \; x_{i}^{l}} +} \\ {{\sum\limits_{j \in J_{il}^{-}}\; {{{s_{j}^{l} - {{\overset{\_}{x}}_{i}^{l}{ + }J_{il}^{-}}}}\Delta \; x_{i}^{l}}} + {\sum\limits_{j \in J_{il}^{0}}\; {{s_{j}^{l} - x_{i}^{l}}}}} \end{bmatrix}}}}} & (62) \end{matrix}$

where |J_(il) ⁺| and |J_(il) ⁻| are the cardinalities two first subsets.

When the position of centroids x moves within the iterative process, the value of the first two sums of (62) assumes a constant value, since the values s_(j) ^(l) and x _(i) ^(l) are fixed. So, to evaluate F_(G)(x) it is only necessary to calculate the displacements Δx_(i) ^(l), l=1, . . . , n; i=1, . . . , q, and evaluate the last sum, that normally has only a few number of terms because δ assumes in general a relatively small value.

The function F_(G)(x) above specified is nondifferentiable due the last sum, so in order to use gradient information, it is necessary to use a smooth approximation.

$\begin{matrix} {{{{minimize}\mspace{14mu} {F_{G}(x)}} = {\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; \left\lbrack {{\sum\limits_{j \in J_{il}^{+}}\; {{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l}}}} - {{J_{il}^{+}}\Delta \; x_{i}^{l}} + {\sum\limits_{j \in J_{il}^{-}}\; {{s_{j}^{l} - {\overset{\_}{x}}_{i}^{l}}}} + {{J_{il}^{-}}\Delta \; x_{i}^{l}} + {\sum\limits_{j \in J_{il}^{0}}\; {\theta \left( {s_{j}^{l},x_{i}^{l},\gamma} \right)}}} \right\rbrack}}},\mspace{79mu} {{{where}\mspace{14mu} {\theta \left( {s_{j}^{l},x_{i}^{l},\gamma} \right)}} = {\sqrt{\left( {s_{j}^{l} - x_{i}^{l}} \right)^{2} + \gamma^{2}}.}}} & (63) \end{matrix}$

So, the gradient of the smoothed second part of objective function is easily calculated by:

$\begin{matrix} {{{\nabla{f_{G}(x)}} = {\sum\limits_{i = 1}^{q}\; {\sum\limits_{l = 1}^{n}\; {\left\lbrack {{- {J_{il}^{+}}} + {J_{il}^{-}} + {\sum\limits_{j \in J_{il}^{0}}\; {{- \left( {s_{j}^{l} - x_{i}^{l}} \right)}/{\theta \left( {s_{j}^{l},x_{i}^{l},\gamma} \right)}}}} \right\rbrack e_{il}}}}};} & (64) \end{matrix}$

where e_(il) stands for a unitary vector with the component l of centroid i equal to 1.

Therefore, if δ≧Δx was observed within the iterative process, the calculation of the expression Σ_(jεJ) _(G) z_(j)(x) and its gradient can be exactly performed by very fast procedures, equations (63) and (64).

By using the above results, it is possible to construct a specific method, the Accelerated Hyperbolic Smoothing Clustering Method Applied to the Manhattan Metric, which has conceptual properties to offer a faster computational performance for solving the clustering problem given by formulation (50), since the calculation of the second sum is very simple.

A fundamental question is the proper choice of the boundary parameter δ. Moreover, there are two main options for updating the boundary parameter δ, inside the internal minimization procedure or after it. For simplicity sake, the HSC method connected with the partition scheme presented below adopts the second option, which offers a better computational performance, in spite of an eventual violation of the δ≧Δx condition, which is corrected in the next partition update.

The AHSC-L1 Method Initialization Step:

Choose initial start point: x⁰;

Choose parameter values: γ¹, τ¹, ε¹;

Choose reduction factors: 0<ρ₁<1, 0<ρ₂<1, 0<ρ₃<1;

Specify the boundary band width: δ¹

Let k=1.

Main Step: Repeat until an arbitrary stopping rule is attained

For determining the Z_(δ)( x) and G_(δ)( x) partition, given by (29) and (30), use x=x^(k+1) and δ=δ^(k).

Determine the subsets J_(il) ⁺, J_(il) ⁻ and J_(il) ⁰ and calculate the cardinalities of two first sets: |J_(il) ⁺| and |J_(il) ⁻|.

Solve problem (51) starting at the initial point x^(k−1) and let x^(k) be the solution obtained.

For solving the equations (53), associated to the first part given by (52), take the smoothing parameters: γ=γ^(k), τ=τ^(k) and ε=ε^(k);

For solving the second part, given by (63), use the above determined subsets and their cardinalities.

Updating procedure:

Let γ^(k+1)=ρ₁γ^(k), τ^(k+1)=ρ₂τ^(k), ε^(k+1)=ρ₃ε^(k),

Redefine the boundary value: δ^(k+1)

Let k:=k+1.

Comments:

The above method does not include any procedure for considering the occurrence of empty gravitational regions. This possibility can be overcome by simply moving the centroids.

The efficiency of the AHSC-L1 (HSC Method Connected with the Boundary and Gravitational Regions Partition Scheme Applied to the Manhattan Metric) depends strongly on the parameter δ. A choice of a small value for it will imply an improper definition of the set G_(δ)( x) and frequent violation of the basic condition Δx<δ, for the validity of Proposition 1. Otherwise, a choice of a large value will imply a decrease in the number of gravitational observation points and, therefore, the computational advantages given by formulation (63) will be reduced.

As a general strategy, within first iterations, larger δ values must be used, because the centroid displacements are more expressive. The δ values must be gradually decreased in the same proportion of the decrease of these displacements.

Hyperbolic Smoothing Minimum Distance

The minimum distance of a given point to a set of points in a Euclidian multidimensional space is a common component in a broad class of mathematical problems.

In the clustering problem presentation, it was shown that for each observation point s_(j) of the set S, we must calculate the distance z_(j) from s_(j) to the nearest centroid:

$\begin{matrix} {{z_{j} = {\min\limits_{x_{i} \in X}{{s_{j} - x_{i}}}}},} & (65) \end{matrix}$

where ∥s_(j)−x_(i)∥ is the distance following an arbitrary metric.

The kernel component of the General Hyperbolic Smoothing Clustering Methodology is to use a hyperbolic smoothing approximation z_(i)(x), or Hyperbolic Smoothing Minimum Distance, obtained from the calculation of the zero of each function

$\begin{matrix} {{{h_{j}\left( {z_{j},x} \right)} = {{\sum\limits_{i = 1}^{q}\; {\varphi \left( {{z_{j} - {\theta \left( {s_{j},x_{i},\gamma} \right)}},\tau} \right)}} - ɛ}},{j = 1},L,{m.}} & (66) \end{matrix}$

The calculation of the minimum distance is a sub-problem that exists in a broad class of problems, not only in clustering. The calculation in the original form is nondifferentiable and presents strong numerical difficulties. On the other side, the Hyperbolic Smoothing Minimum Distance offers a C^(∞) differentiable approximation. The HSC Method is based on the solution of a sequence of problems obtained by a gradual decreasing of the parameters τ, ε, and γ to 0. A similar approach is a general strategy for solving all problems belonging to this class.

Moreover, any calculation of a minimum (or a maximum) value of a set of values can be reduced to an equivalent task. Let us consider:

$\begin{matrix} {\mspace{79mu} {{w = {\text{?}\left( {t_{i}(x)} \right)}},{\text{?}\text{indicates text missing or illegible when filed}}}} & (67) \end{matrix}$

where t_(i)(x), i=1, . . . , q are arbitrary functions. In a similar way, a hyperbolic smoothing approximation of w can be obtained from the calculation of the zero of the function

$\begin{matrix} {\mspace{79mu} {{h\left( {w,x} \right)} = {{\text{?}{\varphi \left( {{w - {t_{i}(x)}},\tau} \right)}} - {{ɛ.\text{?}}\text{indicates text missing or illegible when filed}}}}} & (68) \end{matrix}$

By using the Implicit Function Theorem, the gradient is obtained:

$\begin{matrix} {{\nabla{w(x)}} = {{- {\nabla{h\left( {w,x} \right)}}}/{\frac{\partial{h_{j}\left( {w,x} \right)}}{\partial w}.}}} & (69) \end{matrix}$

A short list of such problems involving minimum distance or minimum value calculations is presented below:

Min-max problems;

Min-max-min problems;

Min-sum-min problems;

Covering a plane region by circles;

Covering a tridimensional body by spheres;

-   -   Gamma knife problem

Location problems;

-   -   Facilities location problems;     -   Hub and spike location problems;     -   Steiner problems;     -   Minisum and minimax location problems;

Districting problems;

Packing problems;

Scheduling problems;

Geometric distance problems;

-   -   Protein folding problems;

Spherical point arrangements problems;

-   -   Elliptic Fekete problem;     -   Fekete problem;     -   Power sum problem;     -   Tammes (Hard-spheres) problem.

In short, any problem having components described by expressions similar to equation (65) can be smoothed by using the Hyperbolic Smoothing Minimum Distance procedure so that the original problem can be solved by the gradual decreasing of the parameters following the same intrinsic idea as in the Hyperbolic Smoothing General Clustering Methodology. 

1. A method for the unsupervised clustering of a given set of m observations s_(j), j=1, . . . , m, each represented by n components, into a specified number of groups q, represented by a set of centroids x_(i), i=1, . . . , q, following a criterion of minimizing the sum of the values of a monotonic increasing function ƒ applied to each argument z_(j), defined as the shortest distance from each generic observation s_(j) to the nearest centroid, according to the procedures detailed in the section on the General Hyperbolic Smoothing Clustering Methodology. It should be further said that distance measures, both in regards to this claim as well as to the following ones, can be performed according to different metrics, such as the well-known Euclidean or 2-norm, 1-norm, p-norm and infinity norm, or any other norm with similar mathematical properties to the four mentioned norms.
 2. A method for the unsupervised clustering, according to claim 1, following the procedures for partition of the set of observations articulated by the hyperbolic smoothing strategy, as detailed in the section on Boundary and Gravitational Regions Partition Methodology.
 3. A procedure for the partition of a set of observations into Boundary and Gravitational Regions, in isolation or coordinated with another clustering algorithm as, for example, one of the large family of k-means algorithms, as a component of a methodology for unsupervised clustering, as defined in claim
 1. 4. A method for unsupervised clustering, according to claim 1, following the criterion known as minimum-sum-of-squares clustering (MSSC), according to the procedures detailed in the section on Boundary and Gravitational Regions Partition Methodology Applied to the Euclidian Metric.
 5. A method for unsupervised clustering, according to claim 1, following the criterion known as minimum-sum-of-absolute-values clustering problem, according to the procedures detailed in the section on Boundary and Gravitational Regions Partition Methodology Applied to the Manhattan Metric.
 6. A method for the smoothed evaluation of the minimum distance from one point to a number of other points, by calculating the zero of the equation number (20), in formalizing and solving problems of location, districting, covering, packing, scheduling, geometric distance, Gamma Knife, Steiner's or others, where there is a need to calculate such a distance, according to the procedures detailed in the section on Hyperbolic Smoothing Minimum Distance.
 7. A method for gradually making the approximation of the smallest distance from one point to a number of other points, according to claim 6, so close to the exact distance as desired, through the use of gradual decreasing of the smoothing parameters, following similar procedures detailed in the General Hyperbolic Smoothing Clustering Methodology.
 8. A method for calculating the gradient of the smoothed evaluation of the shortest distance from one point to a number of other points, according to claim 6, through expression number (22).
 9. A method for the smoothed evaluation of the minimum value from of a set of values by calculating the zero of the equation number (68), according to the procedures detailed in the section on Hyperbolic Smoothing Minimum Distance.
 10. A method for calculating the gradient of the smoothed evaluation of the minimum value from of a set of values, according to claim 9, through expression number (69).
 11. (canceled)
 12. (canceled)
 13. (canceled) 